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Help us keep PC Forum alive! 
Contribute to the Forum wiki at 
www.socialtext.com/pcforum/ 
Stay tuned as we add features and 
transcripts in the coming months. 
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The conversation starts here. 


BY JEFF UBOIS 


Like almost anything specific enough to define, data garbage has an 
economic identity. At first, it was expensive to create and store, so peo- 
ple took care to eliminate it promptly. Then, as storage and memory 
became cheap, garbage became almost costless to keep, even as it 
became more costly to identify and destroy (because it takes time and 
attention or automated procedures to do so). But suddenly, the prolif- 
eration of tools to find meaning in garbage, combined with an increas- 
ingly litigious, no-problem-without-someone-to-blame society, has 
changed its economics. 


In the essay below, Jeff Ubois explores these dynamics from the 
informed point of view of a practitioner. A Berkeley-based writer, he 
has worked at the Internet Archive, which maintains a 150-terabyte 
archive of Web pages, books, films, and television and radio broad- 
casts, on and off since 1996 . Among his tasks is removing data in 
response to requests by its owners — who want, in effect, to turn their 
tracks into garbage and remove them from the world. 


In 1998, he co-founded Disappearing Inc., now Omniva, which pro- 
vides software that allows users to write e-mail messages in the digital 
equivalent of disappearing ink, and helps users to manage the reten- 
tion, destruction and security of their e-mail. Disappearing Inc. grew 
out of work by others at the Internet Archive that highlighted the need 
for transient communications as well as permanent records, 


A native of Washington, DC, Ubois covered the Internet as a freelance 
reporter from 1987 to 1998 for a wide variety of publications, includ- 
ing CFO, Digital Media, Internet World, and Information Week, 
and wrote market research reports for Communications Industry 
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Researchers (CIR) and the Economist Intelligence Unit. He also works 
on a part-time basis with Ferris Research, a consulting firm specializ- 
ing in collaboration technologies and e-mail. 


Ubois has worked for a number of organizations mentioned in this 
article, including Brightmail and Postini, and this piece tests the 
notion that an accumulation of conflicts amounts to expertise. 

— Esther Dyson 


“Garbage is such a pervasive element in our society — there is a 
garbage angle to every human activity — that it should be hardly sur- 
prising when hopes and plans for garbage disposal give rise to 
unanticipated consequences. Indeed, the interconnections among 
garbage, economic markets, and human behavior are so complex 
that we can almost count on some aspect of even the best-laid plans 
going awry.” — William Rathje, Rubbish! 


Spend an hour surfing the Web using a broadband connection, and 
your system will have cached about 40 megabytes of data, most of 
which is garbage — data you don’t really need or want to keep, and 
which is overwritten once the cache limit is reached. But that cache 
of garbage data says a lot about you, your interests and your activi- 
ties. Our values and identity are reflected as much in what we discard 
as in what we hold dear. Thus Oracle’s use of private detectives to go 
through Microsoft’s trash a few years ago in Washington, DC —a 
move that was entirely legal, if ethically suspect. 


“There’s no way to live without throwing things away; it’s not possi- 
ble,” says William Rathje, professor of archeology at Stanford 
University and author of the 1992 book Rubbish!, which established 
garbology as a modern academic discipline. 


Garbage is an “out of sight, out of mind” problem, and for that rea- 
son, garbology provides a window into human life and identity that 
illuminates the distinctions between what people say they do and 
what they actually do. For just that reason, investigations into the 
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most damaging US spies of the last 20 years, Aldrich Ames and Robert Hanssen, 
involved considerable analysis of trash by law enforcement. 


“The data we don’t want, like spam, says a lot about who we are and what we care 
about,” Rathje says. For garbologists, it’s no coincidence that most spam is about sex 
and money — spammers crudely reflect our wants, and thus assault our self-image. 


Digital garbage 

Once upon a time, it was expensive to store information: Paper was hard to share, 
distribute and manage. It took up a lot of space and was hard to file, index and 
retrieve. Until recently, this was also true of digital information: When IBM intro- 
duced the first hard drive in 1957, it used 50 24-inch disks that stored five 
megabytes, and it cost nearly $40,000 per year to lease. Nonetheless, the computer 
industry historically has considered garbage as an afterthought, if at all, focusing 
instead on delivering the right information at the right time to the right person, and 
ensuring that data is preserved. 


To a certain extent, this lack of interest has been driven by economics. Over the past 
ten years, the amount of information in digital format has skyrocketed while the 
price of storage has dropped from $1 per megabyte to under $1 per gigabyte. This 
trend has reduced the incentive to throw away old bits: It’s more expensive to sort 
than to store. 


“We will asymptotically approach the point where it costs more to delete data than 
to keep it because the overhead of going back to do that maintenance will be more 
expensive,” predicts Microsoft cto Craig Mundie. “Soon, the amount of storage avail- 
able will make it economical to maintain your life record; an indexed copy of every- 
thing you've read will be doable.” 


But suddenly, the proliferation of tools to find meaning in garbage — e-mail trails, 
behavior tracking, data mining and the like — has changed its economics: Garbage 
has again become costly — or “negative-value” — because it is of potential value to 
one’s enemies, whether in litigation, corporate espionage or other kinds of activities 
where one man’s garbage is another man’s delightful discovery. Moreover, new, 
often contradictory, regulations require that it be handled carefully. In short, 
garbage carries risk. 
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Garbage causes small but continuous expenses, punctuated with occasional cata- 
strophic losses. These costs are widely distributed and hard to measure. The cost to 
delete one unwanted e-mail is infinitesimal, but the cost of deleting billions of spam 
messages, or of e-mail system outages caused by spam attacks, can be enormous. 
Haphazard disposal practices can result in the leakage of confidential information. 
The cost of retrieving relevant documents such as e-mail messages from hundreds of 
back-up tapes and hundreds of thousands of individual message files during legal 
discovery proceedings is enormous. 


Ernst & Young’s litigation management practice charges $1 to $2 each to review and 
produce relevant messages from e-mail archives. If forensics work is involved, the 
cost can go even higher. Many companies opt to settle lawsuits rather than pay to 
restore and review years of backed-up data — the class action lawsuit against 
American Home Products case over its Phen/Fen diet drugs involved 33 million e- 
mail messages. 


After-the-fact panic, rather than careful risk management, has been the typical 
approach to garbage problems. It should be no surprise then that many of the com- 
panies working to solve the garbage problem have pricing models similar to insur- 
ance firms (or waste management services): small, regular payments for services that 
reduce the probability of catastrophic losses. 


Handled poorly, data garbage emits a variety of bad smells — wasted resources, em- 
barrassment, security breaches and even the shutdown of companies. Microsoft is a 
case in point. Old e-mail (or multiple conflicting interpretations of it) was the lynch- 
pin of the US government's anti-trust case. “If the only reason to delete data is the 
legal environment, then you have to ask ‘is that the right legal policy?” Mundie asks. 


Maybe not — in the post-Enron legal environment, there is no such thing as private 
e-mail. But companies need to protect themselves from some painful and expensive 
issues that have common roots in the way we traditionally handle garbage: 


Data Ownership. One traditional right of ownership is the right to throw something 
away. But a growing fraction of corporate data is not owned in this traditional sense: 
When Andersen threw confidential data in the garbage, the company wasn’t exercis- 
ing legitimate property rights; it was violating the law. Though it’s legal to throw 
away scraps of paper, and phone calls are not usually recorded, organizations are 
increasingly required to retain every scrap of data that passes through their e-mail, 
instant message, telephone and voice mail systems. In the electronic world, the dis- 


WWW.EDVENTURE.COM 


tinction between transient messages that can be casually thrown away and business 
records that must be preserved is disappearing. 


Conversely, digital rights management systems enforce ownership rights by trans- 
forming information into gibberish (and vice versa), as determined by the owner. 
“XML-based description languages with rights management built in, for documents 
for example, may have expiration dates so disposition is specified consistently,” 
Mundie says. 


Spam. The most obvious instance of the Net’s garbage problem, spam was ranked as 
the number-one concern among IT managers in a recent survey by IDC, and Ferris 
Research recently calculated that spam will cost corporations billions of dollars in 
lost productivity over the next few years. 


“There’s been a 150 percent increase in the amount of spam in the last year, and if 
companies don’t solve their spam problems, the value of their e-mail systems will be 
significantly reduced,” says Scott Petry, co-founder and vp of products and engineer- 
ing at Postini (see pace 22). “If you want to see the future of e-mail, look at what hap- 
pened to Usenet: It once was a very valuable to way for people on the Net to 
communicate, but spam completely killed it.” 


Security, confidentiality and privacy. Over the past few years, paper shredders have 
become a common consumer item, primarily because of fears of identity theft. But 
the quantity of paper shredded is unlikely to equal the amount of data on the more 
than 100 million hard drives that will be thrown away this year, many of which will 
contain recoverable confidential data. The consequences of haphazard disposal prac- 
tices can be extreme: When the Department of Justice mistakenly auctioned off com- 
puters containing the names of people in the Witness Protection Program in the early 
1990s, it was forced to relocate people whose whereabouts had been compromised. 


Other vivid examples of garbage as a security threat can be found among the stories 
of long-buried e-mail returning to haunt its creator. As Ollie North, Bill Gates, and 
Bill Clinton all discovered, e-mail is nearly impossible to throw away. For just that 
reason, within days of being elected, George Bush announced that he would not use 
e-mail while in office. 
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Defining garbage: A matter of taste? 

A key unresolved question is how to define garbage, which is surprisingly hard to 
answer: We may know it when we see it, but we can’t manually examine terabytes of 
information or predict how what seems irrelevant now may become important. In a 
sense, defining garbage requires changing unmanaged data into managed data — 
always an expensive proposal. And once garbage is defined as such, the options for 
its disposal aren’t always easy to execute. In the physical world, there are four 
approaches: toss it, burn it, recycle it or practice what is called “source reduction,” Le. 
reduce at the source the amount of material that will end up as solid waste. 


In some contexts, there are precise definitions of what data should or shouldn’t be 
thrown away, but usually there is a lot of leeway. “That Ford up on blocks in your 
yard, your neighbors might think of as trash, even though you know it’s really a pro- 
ject,” says Ben Gross, a visiting scholar at the School of Information Management 
and Systems at the University of California, Berkeley. “Or that stuff you think of as 
‘memories’ might be garbage to your spouse.” One of the challenges for anyone try- 
ing to develop rules for taking out the trash is to translate those subjective judg- 
ments into logical routines. 


In 1948, Bell Telephone Company researcher Claude Shannon introduced the con- 
cepts of noise, feedback, information loss, message fidelity, and the term “bit” (a 
shortening of the words binary digit) in an article called “A Mathematical Theory of 
Communications” and at greater length in a book co-authored with mathematician 
Warren Weaver. For information theorists such as Shannon, garbage is the “noise” in 
the “signal to noise” ratio. Noise is the opposite of information, and it is defined as 
anything added to the signal not intended by the source. 


Cryptography can be viewed as a mechanism for preserving secrecy by adding noise 
to a communications channel, an issue Shannon explored in a later paper. As with 
other types of garbage, it’s sometimes possible to recover some residual value from 
encrypted communications — a little remaining signal in the noise, such as traffic 
patterns of encrypted communications, unconscious disclosure of real feelings 
through body language, background sounds that betray a person’s whereabouts, or 
unintentionally forwarded e-mail. Such analysis is the digital equivalent to scaveng- 
ing and recycling. 


But for most of us, noise is more what we don't want to hear rather than what we 


don’t want to say. A more common approach to garbage management, used by 
records managers for decades, is based on the concept of the information lifecycle. 
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A STACK OF GARBAGE 


Taxonomies have been a recent theme in these pages, and 
with some modification, the old seven-layer International 
Standards Organization/Open Systems Interconnect 
model for networked computing, which starts at the phys- 
ical layer, and moves up through networks, systems, and 
applications, offers an interesting way to enumerate many 
disparate garbage issues. 


The physical layer: Old equipment - More than 100 million 
disk drives will be thrown away this year, and most of them 
are likely to contain sensitive data that is easily recover- 
able. Given some of the new laws intended to safeguard 
privacy (SEE BOX, PAGE 9), this is a serious potential civil 
and criminal liability, and the inadvertent or careless dis- 
posal of old equipment has already created a whole genre 
of cautionary tales. 

In 2000, Sir Paul McCartney's financial records 
were recovered from a PC that investment bank Morgan 
Grenfell Asset Management sold at auction. In 2002, the 
US Department of Veterans Affairs, which had policies in 
place regarding computer disposal, failed to wipe data 
from 139 PCs containing individuals’ medical records and 
other sensitive data, such as credit card information. 
Other stories about data recovered from the machines of 
banks, hospitals, government departments and law firms 
are widely referenced on the Net. 

In the long run, file systems that automatically 
encrypt everything as it is written to disk may help solve 
this problem. Windows XP and 2000 already provide this 
capability, though it is seldom used. A number of stand- 
alone products, such as Scramdisk, E4M, and PGPdisk, do 
this as well. For frequent travelers with sensitive data on 
their laptops, these utilities are a good idea. 

On the Net: Garbage in transit - Most digital data spends 
part of its time as a packet. While this garbage tends to be 
more transient, it can still be a problem. For example, 
while network caches can improve performance by moving 
data closer to end users, they may serve up obsolete data. 
Distributed denial-of- service attacks bury targeted 
servers in a heap of garbage packets. Dropped packets 
and network errors can also slow the network to a crawl. 
On the server - Server processes generate different log 
files that detail events and gradually expand to fill what- 
ever space they are allowed. This presents both problems 
and opportunities for utilities companies. The downside of 
these files is that they present a legal risk to corporations, 
and a potential threat to the privacy of individuals. 
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Privacy guarantees on Websites may be violated inadver- 
tently through the unmanaged operation of logging func- 
tions. Amazon's Alexa subsidiary was fined by the FTC for 
just this reason. As we go to press, the Recording Industry 
Association of America is suing Verizon for access to log 
files that might reveal the identities of file traders through 
IP addresses. Privacy advocates say that if Verizon is 
forced to turn over its logs, private individuals alleging 
copyright violations will have more power to monitor com- 
munications than the police (who must obtain warrants). 

As with old e-mail, which is both archived and 
destroyed, there have been two responses. One is to mine 
these log files for information about system security, 
usage patterns and end-user demographics. Companies 
such as WebTrends, Accrue, SPSS and CoreMetrix take 
this approach. The other response, spearheaded by the No 
Logs Network, has been a grass roots effort to eliminate 
or protect these files entirely. 

On the desktop - Operating systems generate data mas- 
sive amounts of data in swap, temporary and system files, 
most of which is never used. As software is installed, 
updated and removed, various files are left behind. And as 
this cruft (defined in the New Hackers’ Dictionary as n. 
Excess; superfluous junk; used esp. of redundant or 
superseded code) accumulates, systems gradually slow 
and begin to exhibit anomalous behavior, which is usually 
cured by re-installing the operating system. 

While all operating systems have a need for 
garbage collection, most don't do it very well. This has 
created a small market for utilities vendors that specialize 
in cleaning up unneeded files. Products such as 
Symantec's SystemWorks and Network Associates’ 
McAfee QuickClean address this for Windows users, while 
OmniGroup's OmniDiskSweeper and Aladdin Systems's 
Spring Cleaning (which also eliminates browser cache 
files) does this for Mac users. 

Applications level garbage - E-mail gets most of the 
attention, perhaps because e-mail garbage is the most 
pervasive and often the most toxic. But managing the life- 
cycle of old database records - ensuring that they are 
updated, and purging them when they expire - was an 
issue before e-mail even existed. Similarly, managing ver- 
sion control in document management systems has 
proven to be a problem that is easier to describe than to 
solve. Instant messaging combines a high noise level with 
occasional indiscretions that most users don't expect to 
be recorded. 
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Records Management and the Information Lifecycle 


Your file was so big. 
It might be very useful. 
But now it is gone. 


— from the Salon.com error message haiku contest 


Lifecycle management is a classic approach to controlling large collections of paper 
and electronic documents. Developed by specialists in library science and records 
management, the lifecycle approach is favored by large organizations to ensure care- 
ful, orderly preservation of information and future access on demand. 


The lifecycle model parallels the way we handle physical garbage, describing how 
information is created, distributed, stored and eventually disposed of using rules 
that define retention periods for various classes of documents. One of its strengths is 
that its rules are media-independent, which means they can be applied to paper doc- 
uments, electronic files or clay tablets. Another strength is that the rules can be 
applied by people (or programs!) with no subject-matter expertise, which makes 
them scalable and extensible. 


Garbage disposal is important in the lifecycle model because it is assumed that data 
preservation and access is expensive. That makes disposal vital for reducing costs, as 
well as for preserving confidentiality and reducing search times. In large organiza- 
tions, the rules governing data retention can be incredibly complex. The National 
Archives and Records Administration, for example, has detailed schedules for hun- 
dreds of different types of documents. 


Most often, information reaches the end of its life as a function of time. Federal law 
requires that tax records must be retained for seven years, a state law may require 
personnel records be kept for three, and a typical corporate policy may limit e-mail 
retention to 90 days. On a more individual level, people may throw out the previous 
year’s calendar each January, or purge their mailboxes once a month. 


Another common trigger for data disposal is in response to an event. For example, 


all the papers related to a contract negotiation may be shredded once the contract is 
finalized, and only the contract itself is retained. 
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THE LEGAL DEFINITION OF TRASH 


Over the past few years, a number of new laws have been 
passed that affect how people manage data trash: 


Sarbanes-Oxley Act. Passed in 2002 in response to cor- 
porate accounting scandals, the SOX" legislation requires 
that companies implement stringent records management 
procedures. While destruction of records within a standard 
corporate policy is still permitted, it is a felony to know- 
ingly destroy or create documents that impede, obstruct 
or influence any existing or even any contemplated federal 
investigation. Companies must now preserve any records 
that might conceivably become relevant - IM transcripts, 
voice-mail records, old e-mail, and so on. 


Health Insurance Portability and Accountability Act of 
1996 (HIPAA). HIPAA limits the use and disclosure of indi- 
vidually identifiable health information, and requires care- 
ful disposal of patient records. 


Gramm-Leach-Bliley (GLB) Act. GLB requires financial 
institutions to protect the security and confidentiality of 
their customers’ nonpublic personal information. As with 
HIPAA, in practice this means that documents can't sim- 
ply be put in a dumpster; they need to be shredded or 
erased (if on electronic media). 


Regional and state laws on data destruction. California 
recently passed a law that requires that “A business shall 
take all reasonable steps to destroy, or arrange for the 
destruction of, a customer's records within its custody or 
control containing personal information which is no longer 
to be retained by the business by (1) shredding, (2) eras- 
ing, or (3) otherwise modifying the personal information in 
those records to make it unreadable or undecipherable 
through any means.” 

A Wisconsin law says that no financial institution, 
medical business or tax preparation business may dispose 
of records containing personal information unless they are 
destroyed, and a similar law in Georgia even provides 
criminal penalties. 

The European Data Protection Act covers prac- 
tices related to confidentiality. A core principle of the Act 
is that data is “not kept longer than necessary.” 

In addition to laws regarding data management, 
it's also worth noting the plethora of anti-spam laws. 
Though federal legislation has stalled, 26 states now have 
anti-spam legislation, and additional states are expected 
to pass anti-spam laws this year. Faced with the prospect 
of complying with a multitude of state laws, The Direct 
Marketing Association, which in the past opposed federal 
legislation, is now pushing for it, as is Microsoft. 


And sometimes data reaches the end of its lifecycle unintentionally as a result of sys- 


tem failures. When transmissions are garbled, messages arrive too late, storage fails, 


passwords are forgotten, media decays or hardware breaks, data may be transformed 


immediately into garbage. 


A legal interlude 


As Iron Mountain (below) knows, there’s no way to apply the lifecycle approach in 


corporate and other organizations without referencing the laws that define what you 


can, can’t and must classify as garbage. Unfortunately, “garbage” doesn’t have a single 


legal definition, which means that anyone contemplating either saving or deleting 


data faces risk and uncertainty. 


According to Donald Skupsky, president of the Information Requirements 


Clearinghouse, a Denver-based records-management consulting company, more 


than 2500 federal, state, and local laws govern data retention and destruction (see 
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BOX PAGE 9), and the companies working to build products that help customers meet 
legal requirements admit they are trying to hit a moving target. 


While some laws mandate new levels of data retention, others now require careful 
disposal and destruction of data. “It’s a confusing message: On the one hand, there 
are now anti-shredding laws,” says Bob Johnson, president of the National 
Association for Information Destruction. Inadvertent or deliberate destruction of 
relevant evidence — spoliation — can result in severe penalties, or summary judg- 
ments against the offending party. “On the other, Wisconsin, California and Georgia 
require that certain information be shredded before it’s discarded, mainly to prevent 
identity theft.” 


Given the invasiveness of legal discovery options, companies are increasingly con- 
cerned about retaining electronic data. “The reality is that any communication cre- 
ated by a business — even a note on a cocktail napkin — can come back to haunt you,” 
Johnson says. “We think we have a good system to defend your trade secrets and pro- 
tect confidentiality, but even someone with good intentions can get into trouble.” 


When disputes turn ugly, there may be forensic investigations in which the erased 
portions of disk drives are examined for evidence. Demand for these services is 
booming; while large-scale forensics investigations are often conducted by account- 
ing firms such as Ernst & Young or Deloitte & Touche, there are also firms that spe- 
cialize solely in computer forensics, such as Computer Forensics and Electronic 
Evidence Discovery, both of Seattle, WA, and Kroll Ontrack of Eden Prairie, MN. 


Iron Mountain: Applying the lifecycle model 

Founded in 1951, Iron Mountain is the world’s largest records management compa- 
ny. Its approach to information lifecycle management illustrates the power of that 
model, and points to economic opportunities in the transformation of unmanaged 
data into managed data — and eventually into shreds. 


The numbers tell the story. In 2002, Iron Mountain’s revenues increased to $1.32 bil- 
lion from $209 million in 1997, as new regulatory requirements, legal risks, and the 
migration from paper-based to digital records management systems have forced 
companies to take another look at both preservation and destruction of old data. 
Iron Mountain maintains more than 80 million pieces of computer media, 50 mil- 
lion pieces of microfilm, and 232 million cubic feet of paper records in 650 world- 
wide facilities with 47 million square feet of storage space. 
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“Records management is becoming risk management and is becoming a “C-level” 
issue as a result,” says Jim Cuff, Iron Mountain’s vp of information 
technology for digital archiving. “It’s not just a facilities guy who is 


worried about this now, it’s the ceo and cfo who are personally liable TRON MOUNT AI INES 
for their company’s financial reporting policies. With e-mail and Headquarters: Boston, MA 
instant messaging, hallway conversation becomes a formal record.” Founded: 1951 


Employees: 11,000 


Iron Mountain is applying to digital data information lifecycle man- Funding: NYSEIRM 


agement techniques developed for paper records in an ambitious 
URL: www.ironmountain.com 


Key metric: $1.32 billion in revenues 


plan to dominate the market for electronic records management. 
The company has invested $50 million in the last two years in its 
new Digital Archive Service, and will invest another $20 million this year. It expects 
its Digital Archive Service will be a $200-million business within four years. 


As part of this effort, the company has partnered with Oracle, Sun, EMC, and Veritas 
for back-end database software and storage hardware, and has developed software to 
manage data capture, indexing, authentication and management. Internally, the 
company has more than 120 developers who have written code to automatically 
index and digitally sign incoming data. 


Matt Kivlin, product marketing manager for the archiving service, says the Digital 
Archive Service provides “a hierarchy of management control over data,” imple- 
mented through a combination of software and procedures followed by Iron 
Mountain staff. At the base level is indexing that allows fast retrieval on demand. 
One level up is retention management — knowing what to throw away when, based 
on schedules created as documents are stored. Above that is auditing and monitor- 
ing to ensure retention procedures are followed, to prevent tampering and to ensure 
authenticity with both digital signatures and write-once read-many optical media. 
At the top of the hierarchy is long term or perpetual storage — every ten to fifteen 
years, data is transferred to new media. 


While the indexing, retrieval, and access capabilities provided by the Digital 
Archiving Service will provide much of the value to customers, information destruc- 
tion is an important selling point. Ken Rubin, Iron Mountain’s executive vp of mar- 
keting, says that consistent application of retention policies is critical to legal risk 
management and compliance, and to reducing storage costs. “The worst thing that 
can happen is if your retention policy is made to look arbitrary or inconsistent, and 
that is what plaintiff’s attorneys try to do,” Rubin explains. “The claim is that if you 
are doing something arbitrary or ad hoc, you must be playing games.” 
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To date, destruction of electronic records has not been a major line of business, but 
the company earns over $40 million per year by shredding 200,000 tons of paper and 
archival records per year for 18,000 customers. It has recently been on a buying 
spree, acquiring 16 regional shredding companies in the last few years. 


Again, legal compliance is a major driver for this business. “Healthcare organizations 
must protect patient confidentiality, so we are seeing an increase in the business for 
secure shredding in health care,” Cuff says. “We have about 650 records centers, and 
wherever there is a records center, our goal is to also have a shredding facility.” 


Digital Waste Management 


Once data is defined as garbage, there are a number ways to dispose of it. All four 
physical-world approaches (toss it, burn it, recycle it or “source reduction“) are now 
being used to handle digital garbage. Tossing it — leaving it on old disk drives, or 
dropping files in the Windows recycling bin — is the most common approach, but it 
doesn’t add value or remove legal risk. However, there are a number of companies 
that are offering the digital equivalent to incinerating, source control and recycling. 


Vendors of wipe disk utilities, e-mail that expires and digital rights management 
might be said to incinerate data that is no longer wanted. Anti-spam efforts, as well 
as the tendency to avoid e-mail for sensitive matters, are all about source control. 
Open-source development and Web services that make available legacy systems and 
technologies, are equivalent to recycling. And in the world of data, there’s a fifth 
option: save everything, at least for a few years. That’s the approach taken not just by 
Iron Mountain, but also by companies like Zantaz and FaceTime. 


Zantaz: Indexing corporate e-mail 

In December 2002, the SEC fined Goldman Sachs, Salomon Smith Barney, Morgan 
Stanley, Deutsche Bank Securities and Piper Jaffray a total of $8.25 million for failing 
to preserve e-mail communications, as required by law. 


That was good news for Zantaz, which was founded to help financial services firms 
comply with federal records management laws, particularly SEC rules that mandate 
e-mail and instant message retention. After several slow years during the dot-com 
boom, the recent business accounting scandals have helped the company land con- 
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tracts with some of the largest financial and healthcare firms in the country, includ- 
ing E-Trade and Bank of America Securities. The company has expanded from just 
storage into discovery services for archived e-mail, documents, and instant messages. 


“Our business is growing at an incredible clip right now,” says Roger Erickson, 
Zantaz’s vp of technology solutions and services. “There’s a confluence of events and 
an abundance of litigation that has caused people to realize the downside of not 
being in compliance.” 


Zantaz customers pipe the output of their e-mail systems to the company’s storage 
facility, where it is time stamped, indexed and digitally signed. Companies may also 
pipe the output of instant messaging systems such as FaceTime, Akonix, IMlogic 
(SEE RELEASE 1.0, MARCH 2003) and others directly to Zantaz. The recent increase in lit- 
igation has also caused the company to develop a lab capable of rapidly restoring 
large quantities of backup tapes, a vital service for companies suddenly confronted 
with a discovery request. 


The company has about 60 terabytes of e-mail in its archives now, and Erickson says 
that is growing by seven to eight terabytes per month. Zantaz has thus been a huge 
beneficiary of Moore’s Law, which improved processing and storage by more than an 
order of magnitude since the company was founded. 


Like Google, Zantaz has parallelized its search technology to improve performance. 
Searches can be conducted using header information, the full text of a message and 
associated attachments, or custom metadata fields determined by 

the customer. The company can now complete complex searches for 


as , . } ZANTAZ INFO 
old e-mail in under five seconds, whereas earlier versions of its 


products could take hours to return results — better than looking Headquarters: Pleasanton, CA 


through printed records, but still slow. For one client, the company rounded: 1226 


is providing litigation support for discovery involving 17 terabytes ee 


worth of e-mail. 


archives 


To date, the company has focused on preservation and access, but URL: www.zantaz.com 


Funding: undisclosed 


Key metric: 60 terabytes of e-mail in its 


the issue of disposal is looming. “It is important to have timely 
destruction: There is nothing to gain by keeping information longer 
[than needed],” Erickson says. But, we ask, is there something to lose? “It’s hard to 
quantify the risks of keeping it; firms are struggling with that now,” he answers. 
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Though the company has developed software and procedures to manage destruc- 
tion, Erickson says they havert been used yet — companies are only now coming up 
against the end of their retention periods. “It’s a very complicated issue — you have to 
destroy optical platters, backup tapes and data that is online.” 


Longer term, the company is looking to expand beyond compliance and discovery 
services into knowledge management. “The next level is to mine the information in 
our systems,” says David Greene, director of market solutions — financial services. 
“That will be useful to any firm.” 


That segue into knowledge management is also being pursued by firms like 
Cataphora (SEE RELEASE 1.0, MARCH 2003) which is exploiting the market for discovery 
services knitting together conversations that span across different media (e-mail, IM 
and documents), and others in the e-mail archiving market, including KVS. 


FaceTime: HNIDWTGP! (Chat room abbreviation for “Heck no, | don’t want to go private!) 


“My personal belief is that Enron stock is an incredible bargain at current 
prices, and we will look back a couple of years from now and see the great 
opportunity that we currently have...The company is fundamentally sound. 
The balance sheet is strong. Our financial liquidity has never been stronger. 
And we again have record operating and financial results.” 
— Enron ceo Ken Lay, during an online chat session with employees in 
September 2001 — less than three months before the company filed for 
bankruptcy. 


It’s comments like the above that makes logging IM conversations so important to 
those who want to improve corporate accountability. But the cost to employee priva- 
cy, and the potential for increased corporate liability, will be enormous. 


Instant messages were clearly intended to be ephemeral, and therefore private in a 
way that e-mail is not. But that’s not true anymore, especially in financial services, 
where SEC rules require companies to log IM conversations. “If employees want pri- 
vate communications, the IM system is not the way to do it,” says Michael Overly, a 
partner with Foley & Lardiner in Los Angeles. 


WWW.EDVENTURE.COM 


IS GARBAGE A PRIVATE MATTER? 


Historically, courts have ruled that when you put garbage 
out for collection, you surrender not only any claim of 
ownership, but any right to privacy as to its contents. 

In California vs. Greenwood, the Supreme Court 
found that there is no right to privacy in discarded 
records. That's been a powerful tool for law enforcement 
and a boon to the paper-shredding industry, which notes 
that the disposal of unshredded material can now result in 
the legal loss of trade secrets. 

But in December 2002, after Portland city offi- 
cials strongly criticized a local judge who had ruled against 
the legality of such warrantless searches, repor-ters for 
the Willamette Week infuriated the mayor, city attorney 
and police chief by retrieving the garbage each one left on 
the curb. The mayor's public response: “I consider 
Willamette Week's actions in this matter to be potentially 
illegal and absolutely unscrupulous and reprehensible.” 

The connection between garbage, personal iden- 
tity, and privacy is complex. While deliberate recovery of 


old e-mail messages and data from old hard drives has 
made most of the headlines, data retention is rapidly 
becoming a new battleground for privacy advocates. 

New laws, particularly in Europe, require commu- 
nications service providers to capture and store informa- 
tion concerning the communications habits of their users. 
Ordinarily, most of this data would be considered garbage, 
but not anymore. 

Just as the trash put out on the sidewalk by a 
criminal suspect can turn into “Exhibit A" when it is 
scooped up by investigators, the data we all generate 
through our e-mails, phone calls and transactions is being 
transformed from garbage into potential evidence. Short 
of tapping every call, cataloging data garbage may prove 
to be one of the most intrusive forms of surveillance ever. 

While data retention can provide needed account- 
ability, if throwing away data gradually becomes a criminal 
act, as it will under the new data retention laws, what we 
may have thrown away is our right to privacy. 


Into this new legal environment have stepped a number of companies — including 
FaceTime, IMlogic, Cordant and Bantu — offering logging, archiving, and other ser- 


vices that help companies comply with the law. 


“We asked ourselves, “What are the disruptive technologies that impact business 
communications, and where would dollars be spent on that?” FaceTime co-founder 
Mehdi Maghsoodnia says. Instant messaging was the answer, and the company 
began by offering a CRM solution compatible with all the major IM clients and ser- 
vices, allowing companies to offer live tech support on their websites. 


“We handle the management, queueing, routing, forwarding, transferring, billing, 
reporting and chargeback,” explains ceo Glen Vondrick. “Were doing [for instant 
messaging] all the things an organization can do with e-mail or a telephone switch.” 


“You can use our system and take ‘calls’ from AOL, Yahoo and MSN [instant messag- 
ing clients],” Maghsoodnia says. “We are not simply a logging tool, we are a full-state 
proxy, and we actually maintain all the communications with the server in real time.” 


The company built AOL’s AIM Enterprise Gateway, and has partnerships with 
Hewlett-Packard, IBM, Microsoft and Reuters to support business use of IM. 
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FACETIME INFO 


Headquarters: Foster City, CA 
Founded: January 1998 


Employees: 50 


Funding: undisclosed amount from BA 
Venture Partners, TH Lee Putnam ty, they have to use it, because there is a lot of pressure by SEC on 
Ventures and Sutter Hill Ventures auditing requirements,’ Maghsoodnia says. 

Key metric: 50 enterprise accounts 


URL: www.facetime.com 


In September 2001, the company introduced IM Auditor, which logs 
all IM conversations within an organization. The system has been a 
hit on Wall Street. The company had 50 enterprise accounts by the 
end of 2001; current customers include Bank of America, Barclays, 
Citigroup and Wachovia Securities. “Within the financial communi- 


IM Auditor provides multiple levels of monitoring. The least intru- 
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sive is network traffic monitoring. More intrusive is ad-hoc real- 
time monitoring of conversations. Potentially more intrusive still is 
long-term or permanent archiving of all conversations. “Most firms now believe it is 
in their interest to have an accounting of IM conversations,’ Vondrick says. “We offer 
real-time monitoring as well, but no one has wanted that. They want an auditable 
record that is searchable.” 


Though IM Auditor inserts notifications to users that conversations are being 
logged, the privacy implications aren’t lost on FaceTime executives. “The liberal 
thinking on this is losing,” Maghsoodnia says. 


Like video cameras in public places, logging data that usually isn’t reviewed may be 
more about preventing crime through universal surveillance than busting “perps.” 
As Vondrick points out, “logging and auditing eliminates garbage. People in cubes 
are more productive because they don’t want to be heard doing things that are not 
work related.” But as use of the system spreads, look for Wall Street’s bistros to fill 
with executives in search of face time that isn’t logged. 


Incineration and Shredding 


While there are a variety of small utilities intended to help end users destroy old 
data, the enterprise-class products focus on incinerating e-mail, which is arguably 
the most dangerous of the Net’s garbage problems. Messages are accreting at the rate 
of billions of copies per day, and there is essentially no way to get rid of them once 
they are in the wild. 


Sending an e-mail is an irrevocable act. There is no “oops” button. Multiple copies of 
every message are created and stored on senders’ machines, recipients’ machines, 
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and on multiple mail servers. It’s as if all of us have voluntary bugged our offices, 
and made transcripts available to anyone with an inclination to sue. 


“The preservation and discovery of computer-deleted material has forced compa- 
nies and prudent individuals to severely curtail the practice of using e-mails for all 
but the most innocuous materials,” writes Judge James M. Rosenbaum of the United 
States District Court for the District of Minnesota, in his paper “In Defense of the 
Delete Key.’ “Any other course of action subjects the computer user to long-term lia- 
bility for idle thoughts... We are, instead, enforcing a dangerous self-censorship over 
our ideas and expressions.” 


While Iron Mountain, Zantaz and Cataphora deal with this problem by making it 
cheaper and easier to sort through and limit discovery requests, other companies are 
working on solutions that transform e-mail back into an ephemeral mode of com- 
munication by implementing a simple form of digital rights management that 
allows companies to automatically enforce e-mail security and retention policies. 


The basic approach works like this: When a user hits the “send” button, the outgoing 
message and attachment are encrypted using a symmetric key fetched from a server 
(often called a “policy server“). To open the message, the recipient must retrieve the 
same key from the sender’s server, which implements policies assigned by the sender 
or his company. 


Senders can specify who has access to the key, how long the key will be available, 
whether the information can be forwarded or printed, and so on. Senders can check 
to see if the key has in fact been retrieved from the server by the recipient, indicating 
that the message has been read. When a certain amount of time has gone by an event 
has occurred, or the sender simply wishes to destroy all instances of a message, 
access to the key is removed, again according to a user-determined policy. 


Similarly, corporations can set global policies for groups of users about key access 
and retention (e.g. all message keys in the HR group should be kept for a year, all 
message keys in accounting should be kept for seven, no key access should be 
allowed outside the corporation, and so on.) 


Several companies, including Authentica, Omniva and Sigaba have been built 


around variations of this idea, and Microsoft is promising many of the same func- 
tions in Palladium, its forthcoming digital rights management system. 
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“It’s technically feasible now to store everything, but it’s socially essential to have a 
mute button or cancel or disposal feature of some kind,” says Jim Gray, a 
Distinguished Engineer at Microsoft Research, formerly with DEC. 


Like Mundie, Gray suggests the lifecycle model is tied closely to digital rights man- 
agement (DRM). “If every data object has a policy that describes how it is used and 
how dies, then you have DRM,” Gray says. 


Authentica: Shred-as-you-go 

“We have built a system in which e-mail is sent in its shredded form,” says 
Authentica ceo Lance Urbas. “People have to be able to expire old e-mail, and the 
only way to do that is to encrypt it and manage the keys.” 


Given the continuing flow of e-mail-related horror stories, it is perhaps surprising 
that so little e-mail is sent with any form of rights management. “It is now accepted 
that e-mail is a record, and all the laws and standards that apply to the management 
and retention of electronic records apply to e-mail,” Urbas adds. 


In other words, e-mail expiration is a critical feature, and Urbas believes it is only a 
matter of time before it is universal. “How long did it take Detroit to figure out the 

need for a restraint system?” Urbas asks. “I’ve been using e-mail for 
30 years, and this [e-mail expiration system] is like airbags.” 


AUTHENTICA INFO 


Headquarters: Waltham, MA After founding the firewall vendor Raptor Systems and selling the 


Founded: 1998 
Employees: 45 


Funding: undisclosed from 3i US, 
Greylock, Intel 64 Fund, North 
Bridge Venture Partners, Norwest 
Venture Partners and Venrock 


URL: www.authentica.com 


company to Axent (later acquired by Symantec) in 1998, the same 
team founded Authentica. The company now has about 40 employ- 
ees, down from about 70 two years ago. Before joining Authentica, 
Urbas held several engineering, sales and marketing management 
positions at Veritas, Convex Supercomputer Corporation, Tandem 
and Data General. 
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The company initially developed its “Active Rights Management” 
technology for PDF documents, which were encrypted with a key that was stored on 
a server and accessed according to policies set by the document’s author. That tech- 
nology is now used to distribute the daily briefing in the Executive Office of the 
President, and serves as the basis for the company’s PageRecall and MailRecall prod- 
ucts. The company has also tweaked its underlying technology to provide secure e- 
mail messaging with a product called SafeRoute. 
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WHY “DELETED” DOESN'T MEAN “DESTROYED” 


Data is amazingly persistent, and efforts to deliberately 
delete it often fail. Disk drives are much better at retaining 
information than most people think, and unless data is 
deliberately overwritten many times, it is likely to be 
recoverable. “What you want to disappear will hang 
around forever, and if you want something, it will be gone,” 
says Ben Gross, a visiting scholar at the U.C. Berkeley 
School of Information Management and Systems. 

If you think of a disk as a library, then deleting a 
file is roughly equivalent to pulling its entry from the card 
catalog, but leaving the book on the shelf. Given knowl- 
edge that the book once existed, finding it isn’t all that 
hard. 

Recognition of this fact is now fairly widespread. 
In the legal community, it is becoming routine in discovery 
requests to demand access not just to existing, cataloged 
files, but to deleted files as well. And the courts see this as 
legitimate: In Gates Rubber v. Bando Chemical Industries, 
Gates was faulted for not making an image copy that 
included deleted files. In response to this demand, a num- 
ber of companies provide utilities that overwrite data 
many times using unpredictable patterns to ensure it is 
properly destroyed, but even these don’t always work. 
Independent tests of wipe utilities have found gaps in 
their effectiveness. The older, lower layers often peek 
through, rather like the painted-over but still discernible 
advertisements on the sides of buildings. 

“If you have paper letters or correspondence, 
once you shred it, it’s gone, but data sticks around forev- 


er," says Peter Gutmann, author of the standard paper on 
data recovery, “Secure Deletion of Data from Magnetic 
and Solid-State Memory.” He recalls, “I know of one case 
where someone ran a data recovery utility called The 
Coroners Toolkit on a Linux box, and found Solaris under 
that, and Windows 95 under that, and data was still there.” 

Joan Feldman, president of Computer Forensics, a 
Seattle-based firm that specializes in recovering data for 
legal reasons, adds that for anyone who has been served 
with a subpoena, wipe utilities will only add to their prob- 
lems. “Wipe utilities leave signatures behind, which raises 
the inference of deliberate destruction,” she explains. 

Feldman says that when drives reach the end of 
their useful life, people should “Pull the drives and drill 
through them in three locations.” (Of course, that too 
leads to the inference of deliberate destruction, but both 
wiping and drilling are legal when routine; neither is legal 
on an ad-hoc basis.) The military specification for 
destroying old data specifically prohibits the use of wiping 
utilities for top-secret data, instead mandating physical 
destruction of the disk platters. 

Finally, even when data is effectively destroyed 
ona single PC, it is still subject to the force Carol Lane 
described in her book, Naked in Cyberspace: "If informa- 
tion exists in one place, it exists in more than one place." 
Eliminating every instance of a file or a e-mail message 
has become almost impossible in a networked world. 

To borrow a phrase from the environmental 
movement, there is no such place as “away.” 


“We applied the rights management concept to documents, and then to e-mail,” says 


Authentica’s director of product marketing, Vic DeMarines. “We’re starting to cross 


the chasm now, and the real driver has been regulation — that has created enough 


cou oa 4 
momentum for organizations to take secure e-mail seriously. 


While the overall concept is fairly simple, the implementation is hard for both legal 


and technical reasons. Legally, it’s critical that no data be accidentally deleted, that 


customers can take action to preserve data. Technically, it’s important that the sys- 


tem be nearly transparent to end users — security mechanisms are ignored if they 


interfere with how people read, write or send mail. 
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Authentica has cut an interesting path through the conflicting needs of end users to 
occasionally recall ill-considered messages, and legal requirements that e-mail be 
preserved. If a sender opts to recall a message, it can’t be read by the person to whom 
it was sent. However, the key to access that message is retained according to the com- 
pany’s e-mail policy. In effect, it’s a kind of key escrow. 


Another application for this type of rights management is in archiving, DeMarines 
says. Currently, destruction of data on optical storage systems is an all-or-nothing 
proposition. But in practice, optical platters contain documents that are on different 
retention schedules. Rights management provides a way to implement those sched- 
ules without destroying the optical media. 


Urbas believes Authentica’s technology can change other business interactions. For 
example, companies can release a request for proposal and then recall it from the 
companies who aren’t selected. He also sees the technology moving into other com- 
munications channels. “We are not just focused on e-mail, but on business applica- 
tions,’ Urbas says. “People are thinking now about how they’re using e-mail, and 
right behind that is instant messaging.” 


Source Reduction: An Ounce of Prevention 


For garbologists, one of the most promising approaches to reducing the amount of 
waste that goes into landfills is source reduction: Stopping the flow and reducing the 
toxicity of garbage at the source through changes in product design and manufac- 
turing. It’s an apt metaphor for companies that are working to block the arrival of 
spam, popup ads, and other information we don’t want to receive in the first place. 


Spam is arguably the worst of the Net’s garbage problems: The largest anti-spam 
vendor, Brightmail, says 45 percent of the messages it processed in March were clas- 
sified as spam, up from 8 percent at the end of 2001. Postini says that 65 percent of 
the mail its subscribers receive is spam. And AOL reports it filters 1.2 billion spam 
messages per day — nearly 50 per subscriber. 


The spam problem is worsening because the volume increasing, and because spam- 
mers are now targeting mobile devices, cell phones and instant messaging clients. 
Spammers are also becoming much more sophisticated in circumventing anti-spam 
filters, often by subscribing to anti-spam services and then mutating their messages 
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when they see these services have been retuned to catch their latest messages. It’s 
rather like the anti-virus business, which is continually coping with new threats, but 
worse, because spammers have an economic incentive. 


As the severity of the problem has increased, so has the number of companies rush- 
ing to capitalize on a solution, with Brightmail, Postini, CipherTrust and 
MessageLabs leading the industry, and at least twenty other vendors vying for a piece 
of the market. That doesn’t even count the non-profit collaborative efforts, such as 
the Realtime Blackhole List, SPEWS, and the Spam Archive, which asks the public to 
“donate your spam to science” so that it can be used to for the development of anti- 
spam tools. 


Marten Nelson, an analyst at Ferris Research, estimates anti-spam vendors will take 
in about $55 million in 2003, and the companies will pay an average of $6 per seat 
per year. “This market has a few early leaders, and lots of new followers,” Nelson says. 


(DISCLOSURE: JEFF UBOIS IS A CONSULTING ANALYST WITH FERRIS RESEARCH.) 


Broadly speaking, these anti-spam vendors can be classed by where they filter out 
unwanted messages. The larger companies, and the ones likely to succeed in enter- 
prise and service provider environments, filter spam at the gateway, and typically 
charge $5 to $30 per user per year. The companies that filter spam on end-users’ 
machines may have an easier time marketing to individuals, but are unlikely to suc- 
ceed in large enterprises. While there is nothing in principle to keep a company from 
offering both approaches, the market leaders focus on the gateway because enter- 
prises customers are resistant to solutions that require maintenance and updating of 
client software. 


Other approaches to controlling spam, such as whitelisting, sender pays (cf 
Vanquish), and challenge/response are likely to be folded into existing products and 
services as options. While each of these approaches holds some promise, they place 
the burden of defining spam back on end users. 


Brightmail: Billions and billions (not) served 

Already an expensive problem for ISPs and service providers, spam is now perceived 
as a threat to corporate productivity, and that is driving sales of spam-control solu- 
tions. “We’re definitely crossing the threshold because companies are now willing to 
pay to solve the spam problem,” says Brightmail ceo Enrique Salem. 


17 APRIL 2003 RELEASE 1.0 


21 


BRIGHTMAIL INFO 


Founded by a team led by Sunil Paul, Brightmail’s business has 


Headquarters: San Francisco, CA 


Founded: April 1998 


Employees: 100 


expanded as the severity of the spam problem has increased. The 
company currently has $22 million in the bank, was profitable in the 
first quarter of 2003, and expects 100 percent revenue growth this 


Funding: more than $40 million from year. (DISCLOSURE: DAPHNE KIS, ESTHER DYSON AND JEFF UBOIS WERE ALL 


Key metric: 250 million users 


URL: www.brightmail.com 


Accel Partners, Technology 
Crossover Ventures, Crosslink 
Capital, Thomas Weisel Partners 


and Symantec 


EARLY INVESTORS IN BRIGHTMAIL.) 


The company now processes about 55 billion messages per month, 
and filters almost 10 percent of the world’s e-mail. Customers 
include Earthlink, AT&T Worldnet, MSN, and Motorola, and 
though the company can’t talk about it, it is common knowledge 
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within the anti-spam community that Brightmail’s largest customer 
in terms of mail volume is Hotmail. Overall, Brightmail has 600 ISP and large enter- 
prise customers with 250 million end users, who Salem says care about “liability, 
productivity, storage, and security.” 


Brightmail’s initial approach to spam control relied on a setting up hundreds of 
decoy mailboxes around the Net, and then posting these addresses in places likely to 
attract the attention of spammers. Since the addresses didn’t belong to real people, 
any mail they got was spam. This network of “spam probes” now compromises mil- 
lions of mailboxes and receives 140 spam messages per second. As new spam is 
detected in the probe network, Brightmail reconfigures the rules used in its spam fil- 
ters in real time. 


The company is refining its approach to filtering spam by developing algorithms 
that can learn by example, and has cut its false-positive rate to one per million mes- 
sages classified as spam. The company is preparing to move into other markets such 
as wireless and instant messaging. 


Postini: Protecting the perimeter 

Another way to frame the spam problem is as an intrusion, i.e. a security issue. 
That’s the approach taken by Postini of Redwood City, CA, which describes its ser- 
vice, the Postini Perimeter Manager, as an e-mail firewall. This approach has allowed 
the company to identify and solve an “upstream” problem, preventing spammers 
from harvesting the e-mail addresses of end users in the first place. (DISCLOSURE: JEFF 
UBOIS WORKED AS A CONTRACTOR AT POSTINI IN 2002.) 
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Scott Petry, Postini’s vp of products and engineering, explains that spammers now 
harvest addresses by attempting to deliver millions of messages to every conceivable 
address on a network — aaaa@yourcompany.com, aaab@yourcompany.com, and so 
on. Messages that don’t bounce are assumed to have reached a valid address. These 
“directory harvest attacks” place a tremendous load on corporate mail servers, and 
Postini’s service intercepts them. “They probe to see who is there, and then send tar- 
geted messages,” Petry says. 


Founded by Apple veteran Scott Petry and Shinya Akimine, both previously with 
Cygnus Solutions, the company has over 1200 customers (mostly ISPs) and process- 
es about 70 million messages a day for 4 million end users. While the 


company’s initial growth was in the ISP market, Petry says most of BOSNIA 


the growth in the last year has been in the corporate market. 


Headquarters: Redwood City, CA 
Founded: March 1999 


Rather than install a gateway box at customer sites, Postini routes its Employees: 60 
customers’ messages though its own servers, and then forwards Funding: $26 million from August 
them. This allows the company to control the SMTP connection Capital, Sun Microsystems and 


stream, and scan for inbound viruses as well as spam. Postini has 
partnered with anti-virus companies Trend Micro and McAfee. 


Mobius Venture Capital 
Key metric: 4 million users 


URL: www.postini.com 


To use to the service, customers change their primary MX record, 

which tells other mail servers how to route mail sent to you@yourcompany.com, to 
point to Postini’s servers, where it is filtered and forwarded. Suspected spam is 
tagged with an x-header, showing a probable value of legitimacy based on a variety 
of factors such as where it came from, how it was addressed, and message contents. 
This tag can be used by the customer’s mail server or by Postini to quarantine, redi- 
rect, or dispose of a message according to policies tuned to an entire organization, a 
single department or an individual. Quarantined e-mail can be reviewed by the 
intended recipient or the system administrator using a secure Webmail interface. 


Routing inbound mail through its own servers also allows Postini to provide a back- 
up service that will spool inbound messages in the event that a customer’s e-mail 
system goes down or is unreachable. Once the situation has stabilized, the spooled e- 
mail can be delivered at a throttled rate to ensure that the server doesn’t get over- 
whelmed and go down again. 
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Intermute: Saving lives with good sanitation 

Intermute’s founder and ceo Ed English isn’t shy about describing the value of his 
company’s AdSubtract software, which has blocked more than 12 billion ad impres- 
sions to date. Assuming an average download speed of 2-3 kilobytes per second, the 
software has saved somewhere between 1268 and 1903 years of end user time, or 
roughly 20 to 30 human lifetimes. At an average CPM of $5, that’s $60 million in 
advertising revenue. 


“The purpose of a popup is to interrupt and distract you,” English says. “We are not 
against advertising, but it has clearly gone beyond the bounds of what is reasonable 
—no company would allow any cold caller to walk in and talk to anyone, but that’s 
what we do now.” 


Bundled with more than 85 percent of the modems sold in the US, and available as a 
boxed product through major retailers for $29.95, AdSubtract blocks banner ads, 
popups, popunders, music and Javascript to conserve users’ bandwidth, time and 
attention. It highlights cookies that are used for profiling users over 
time and across sites, and allows users to delete them and set prefer- 


INTERMUTE INFO 


ences on how they are managed. 


Headquarters: Braintree, 
Founded: 1999 
Employees: 20 


Key metric: blocked more than 12 billion 


ad impressions to date 


Funding: undisclosed 


URL: www.intermute.com 
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The company also offers a anti-spam client software called 
SpamSubtract, and two free products, PopSubtract, which elimi- 
nates popup and popunder ads, and MessageSubtract, which blocks 
Windows messaging popups. Windows messaging popups use a fea- 
ture built into Windows XP and Windows 2000 that was intended to 
allow system administrators to broadcast messages in corporate 


environments (“The server is down.” ). Windows messaging is now 
used by spammers who simply barrage large numbers of IP addresses with unwant- 
ed messages. For online gamers, these messages can be “lethal” to their characters. 


However, English is no absolutist; he is quick to point out that cookies can help end 
users by storing preferences and help site owners understand and improve the expe- 
rience of their users. He also points out that advertisers using networks like 
DoubleClick are not typically charged for impressions that don’t actually take place, 
because when AdSubtract blocks an ad, it does so by intercepting the request being 
sent back to the ad network. Since no request reaches the ad network, the advertisers 
aren't charged for an impression. 
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As with spam, there will be an ongoing conflict between people who want to control 
their own machines, and people who want to push out their marketing messages. 
“We continue to see new ways to push ads into people’s face, to hijack their browser 
and get their attention,” English says. “And we will continue to develop tools to 
defeat these unwanted intrusions into people’s productivity.” 


Recycling: Old Code and Innovation 


In the world of data garbage, old code is the equivalent of glass bottles and alu- 
minum cans; that is, it’s of relatively high value bits and has the greatest potential for 
recycling. While software reuse has been promised as a benefit by a long line of 
advanced software engineering efforts and technologies from ADA, to CORBA, to 
JavaBeans, it’s really in the open source world and on the Web that code recycling 
has paid off. 


“The reason Linux happened is because there were all these pieces of Unix lying 
around,” says Tim O'Reilly, founder of O'Reilly and Associates. “All the user-level 
stuff was there to be recycled, which is why a couple of guys could write what 
appears to be a new OS.” For open-source advocates like O'Reilly, there is a positive 
duty to recycle to ensure continued innovation by lowering the barriers to entry into 
new markets. “The obligation to recycle is about cultivating a fertile ground for 
innovation by giving away what is no longer useful to you,” O'Reilly says. “Most code 
is not recyclable, but pieces of it are, and you need those pieces lying around that you 
can assemble so you don't have to do everything yourself.” 


Conversely, locking up code and other intellectual property unnecessarily creates 
hidden costs. Companies are often tempted to hang on to code they don't use or 
need in the hope of recouping some value later, but that's shortsighted, according to 
O'Reilly. “If you have a clear and present profit motive for hanging on to something, 
fine, but there is this foolish clutching at things that only might be valuable,” he 
claims. “When other people make things of value from what you have thrown away, 
you get some of that value back.” 


For example, when the group developing the Web server at the National Center for 
Supercomputing Applications (NCSA) at the University of Illinois left to pursue 
commercial opportunities, users banded together create Apache. NCSA's mission, to 
support research and education, therefore continued even after it stopped contribut- 


17 APRIL 2003 RELEASE 1.0 


25 


26 


RELEASE 1.0 


ing resources. “The question is what would happen in corporate environments if 
people were willing to give away abandoned products.” 


The release of abandoned products has also created useful products such as the 
Mozilla browser, which is based on code developed by Netscape. And to go back a lit- 
tle further, the anti-trust action that prevented AT&T from going into the computer 
business was what allowed the company to release the Unix code for development 
within the academic community. Not only have Linux and other open-source prod- 
ucts returned value to AT&T, releasing the code has reduced the price AT&T pays for 
software. While Mozilla and Unix prove of O'Reilly's point that recycling can 
improve the industry, Netscape and AT&T weren't able to monetize the value of 
their contribution. 


O'Reilly suggests that established companies need a better understanding of open 
source, and the potential for services built on open source products. “AT&T tried to 
profit by trying to rein in the community development of Unix, turning it into a tra- 
ditional software licensing business,” O'Reilly says. “Entrepreneurs went out and 
delivered services enabled by the free software that the Unix and associated network- 
ing communities had developed. The entire ISP industry was spawned out of the 
commercialization of the UUCP suite of networking protocols and the associated 
Usenet software (later superseded by TCP/IP).” 


Corporations can also benefit from recycling their old code with Web services (see 
RELEASE 1.0, SEPTEMBER AND OCTOBER 2001) that mine the value in legacy systems. 
Rather than assembling aging code modules, Web services repackage systems that 
might otherwise be discarded. 


Companies can also benefit from data mining applications that process what might 
otherwise be garbage (e.g. from transactional systems, or web usage log files) into 
valuable insights about their businesses. 


But as companies increase their efforts to recover residual value from old bits, old 
code, old rights, and other intellectual property, they may be poisoning their own 
environment like yeast in a wine bottle. 


Even today, patent portfolios are assembled by large companies not simply to reap 
licensing revenues, but to provide the basis for counterclaims when others sue for 
patent infringement. As patent and copyright terms are extended, their original pur- 
pose — to promote the circulation of knowledge — is perverted. The dynamic that 
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drives companies to accumulate intellectual property that neither they nor anyone 


else can use will have long term negative consequences for innovation. Throwing 


what is no longer needed, i.e. releasing ownership rights that aren’t being exercised, 


is voluntary approach to ensuring continued innovation. 


Garbage and Our Future 


The assumption that all data is good and should be preserved is gradually proving 
false: Data may be an asset or a liability, damning or exculpatory, depending on its 


context. Conflicts between important social goods — like accountability and privacy, 


or private property and the commons — mean there will never be static solutions to 


defining and managing digital garbage. It’s trite, but one person's trash really is 


another’s treasure. 


Yet the risks associated with ever-growing, unmanaged collections of information are 


increasing as fast or faster as the amount of data we have available. Even as it becomes 


technically feasible and even legally required to save nearly everything, the need for 


effective ways to dispose of unwanted data is becoming more acute, if 
for no other reason than to reduce error. As Mark Twain said, “a lie 
can travel halfway around the world while truth is putting on its 
shoes,’ and that is certainly true of bad data. 


What’s most surprising is how little attention the computer industry 
and corporate America has given to the final phase of the informa- 
tion lifecycle. Our metaphors and definitions aren’t equal to the 
task. While lawyers and records management have been sounding 
the alarm for years, and an ever-growing number of companies have 
landed in serious civil and criminal difficulties as a result of unsani- 
tary practices, there seems to be an innate resistance to thinking 
about the issue. 


Most people don’t like to think about real garbage either: Americans 
lead the world in garbage production, creating about 230 million 
tons of it in 1999, or 4.6 pounds per person per day, up from 2.7 


COMING SOON 


e Weblogs and publishing. 
¢ Enum. 

e Non-homeland security. 
e Language tools. 

e Location-based services. 


e And much more... (If you 
know of any good examples of 
the categories listed above, 


please let us know.) 


pounds in 1960, according to the Environmental Protection Agency. As Rathje 


writes, “The American dream is to expand, grow and consume, so for anything 
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about garbage to change, the American way of thinking has to change, and that is 
awfully hard to do.” 


There’s a deep reason we resist thinking about garbage, Rathje suggests — it forces us 
to confront mortality. Perhaps then it’s no surprise that a Buddhist temple in Kyoto, 
Japan offers an annual prayer for lost "information" every October 24 (the date 
10/24 was selected to mirror the 1024 bits in a kilobit). As the site puts it“ . there are 
many ‘living’ documents and softwares that are thoughtlessly discarded or erased 
without even a second thought. It is this thoughtlessness that has drawn the concern 
and attention of Head Priest Shokyu Ishiko.” 


Deep existential issues aside, information systems need good sanitation, just as cities 
do. As the consequences of bad sanitation become more apparent, the companies 
that can crack the problem — perhaps through new approaches based on a kind of 
environmentalism — will be the ones that clean up. WR 1.0 
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Resources & Contact Information 


Peter Gutmann, pgut001@cs.auckland.ac.nz; www.cs.auckland.ac.nz/~pgut001/ 

Lance Urbas, Authentica, 1 (781) 487-2600; lurbas@authentica.com 

Enrique Salem, Brightmail, 1 (415) 365-5002; esalem@brightmail.com 

Joan Feldman, Computer Forensics, 1 (206) 324-6232; jfeldman@forensics.com 

Mehdi Maghsoodnia, Glen Vondrick, FaceTime, 1 (650) 574-1600; fax, 1 (650) 574-2700 
Mike Overly, Foley & Lardner, 1 (310) 975-7959; moverly@foleylaw.com, 

Donald Skupsky, Information Requirements Clearinghouse, 1 (303) 721-7500; dskupsky@irch.com 
Ed English, Intermute, 1 (781) 356-3077; ede@intermute.com 

Ken Rubin, Iron Mountain, Krubin@ironmountain.com 

Jim Gray, Microsoft, 1 (415) 778-8222; gray@microsoft.com 

Craig Mundie, Microsoft, 1 (425) 706-2547; fax, 1 (425) 936-7329; craigmu@microsoft.com 
Bob Johnson, National Association for Information Destruction, 1 (602) 788-6243 

Scott Petry, Postini, 1 (650) 216-3554; petry@postini.com 

William L. Rathje, Stanford University, 1 (650) 736-2415; wirathje@aol.com 

Roger Erickson, Zantaz, 1 (925) 598-3000; fax, (925) 598 -3145; rerickson@zantaz.com 


For further reading: 

William Rathje, Cullen Murphy (Contributor), Rubbish!: The Archaeology of Garbage, (HarperCollins, 1992); 
http://www.amazon.com/exec/obidos/tg/detail/-/0816521433/qid=1050347524 

Judge James Rosenbaum, “In Defense of the Delete Key,” http://www.greenbag.org/rosenbaum_deletekey.pdf 

File wipe utility tests: http://www.veritest.com/clients/reports/redemtech/redemtech.pdf; 
http://www.seifried.org/security/advisories/kssa-O03.html 

Peter Gutmann, “Secure Deletion of Data from Magnetic and Solid-State Memory, 
http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html 

Simson L. Garfinkel and Abhi Shelat, “Remembrance of Data Past,” 
http://www.computer.org/security/vin1/garfinkel.htm 

Spam Radio (recycling spam into entertainment): http://spamradio.com 

Prayers for Lost Information: http://www.thezen.or.jp/jomoh/kuyo.html 


ARMA Interational (formerly the Association of Records Managers and Administrators): http://www.arma.org 
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MAY 1-3 


MAY 2 


MAY 4-6 


MAY 4-7 


MAY 12-14 


MAY 13-16 


MAY 15-16 


MAY 18-20 


MAY 19-22 


Nantucket Conference on Entrepeneurship and Innovation — Nantucket, 
MA. An invitation-only conference focussed on entrepreneurship in New 
England. Speakers include Ray Ozzie, Ed Zander, Bob Metcalfe and 2003 PC 
Forum company presenter Francis DeSouza. www.nantucketconference.com 


Good Experience Live — New York, NY. Organized by Mark Hurst, founder of 
customer experience consulting firm Creative Good. GEL will gather a diverse 
set of speakers to explore what it means to create a good, meaningful, or 
authentic experience. Register online or e-mail updates@goodexperience.com. 
www.goodexperience.com/gel 


SIIA Annual Conference — San Francisco, CA. Organized by the Software & 
Information Industry Association, this annual conference brings together 
code, content and the industry. Register online or call Beckie Lake, (202) 289- 
7442 x1373. www.siia.net/annual2003/ 


CIO Forum Financial Services — New York, NY. Strategic IT forum for the 
US finacial services industry. Presented by Richmond Events. This year's event 
takes place on board P&O's newest ocean liner Adonia, sailing from New York 
City. For information, visit the Website or call 1 (212) 651-8700; fax, 1 (212) 
651-8701. www.cioforum.com 


AeA Micro Cap Financial Conference — Monterey, CA. The American 
Electronics Association sponsors an opporuntity for public technology com- 
panies to showcase their companies to the investment community. To register, 
contact Tina Morais, (408) 987-4234; fax, (408) 727-7057; 
tina_morais@aeanet.org. www.aeanet.org/microcap/ 


GigaWorld IT Forum — Phoenix, AZ. Giga Information Group's flagship 
event, addressing the issues facing managers of technology. Register online or 
call (781) 792- 2669; e-mail, events@gigaweb.com. www.gigaworldus.com 


TV Meets the Web — Amsterdam, The Netherlands. This year's theme is 
"Digital Media: The Path to Profitability" and will cover video on demand, 
SMS TV, content billing, DRM, and more. Contact Vanessa Vigar, 
vanessa@vandusseldorp.com or +31 (20) 535-6979. 
www.tvmeetstheweb.com/2003/ 


Vortex 2003 - Dana Point, CA. An invitation-only event where executives 
from the telecom, Internet and data-networking industries gather to discuss 
the future of networking. Request an invitation online at www.idgexecfo- 
rums.com/vortex/register.html. www.idgexecforums.com/Vvortex/ 


Future in Review 2003 — San Diego, CA. Organized by Mark Anderson's 
Strategic News Service, this invitation-only event peers into its crystal ball to 
predict the the next 3-5 years of the technology industry. To request an invita- 
tion, contact Susan Schwinge, 1 (360) 378-1023; susan@tapsns.com. 
www.futureinreview.com 
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Calendar of High-Tech Events 


MAY 20-24 


MAY 22 


JUNE 11-13 


JUNE 18-20 


JUNE 23-27 


JUNE 23-27 


JUNE 26-27 


JULY 7-11 


JULY 8-9 


JULY 15-17 


G Events Esther plans to attend. 


Twelfth Annual World Wide Web Conference — Budapest, Hungary. Discuss 
the latest developments in web technology and the issues and challenges facing 
the web community. For more information, visit the website or e-mail 
info@www2003.org. www2003.org 


Global Wireless Summit 2003 — New York, NY. For carriers, service and 
content providers and hardware manufacturers in the wireless industry. 
Keynotes from John Zeglis (CEO of AT&T Wireless) and Kathleen Abernathy 
(FCC Commissioner). Register online. consect.com/NY2003/ 


TedMed3 — Philadelphia, PA. Discover how technology can help you achieve a 
healthier life. Imagine! Register online or call (401) 848-2299; e-mail, wur- 
manrs@aol.com. www.tedmed.com 


CeBIT- New York, NY. Europe's biggest technology trade show comes to 
America. For information about registering or exhibiting visit the website. 
www.cebit-america.com 


IPv6 Global Summit — San Diego, CA. The not-to-miss event for the IP v6 set. 
Register online or contact Alex Lightman, alex@charmed.com. 
www.usipv6.com 


ATPN 2003 - Eindhoven, The Netherlands There's much to be learned 
about networked systems from biology. Discover the wisdom of Petri Nets at 
the International Conference on Application and Theory of Petri Nets, in its 
24th year. Register online or e-mail atpn2003@tue.nl. www.tue.nl/atpn2003 


UpStart Europe 2003 — London, UK. In its fourth year, UpStart will give 
technology entrepreneurs in Europe that get-up-and-go feeling. Register 
online or call +31 (20) 462-1983. www.tornado-insider.com/upstarteurope 


O'Reilly Open Source Convention — Portland, OR. A central gathering place 
for the open source community. Register online or call Linda Holder, (800) 
998-9938 or (707) 827-7000 (outside the US); fax, (707) 829-1342; 
lholder@oreilly.com. conferences.oreillynet.com/o0s2003/ 


Supernova — Washington, DC. The second annual Supernova conference 
aims to explore the path to decentralization of communications, software and 
media. Featuring Reed Hundt and PC Forum 2003 speaker Maria Martinez. 
Register online or call (631) 547-0800. www.pulver.com/supernova/ 


A02003: The Innovation Summit — Palo Alto, CA. At this invite-only event 
leaders from the technology industry, media, academia and government will 
explore the always-on world. Speakers include Esther Dyson, Michael Dell, PC 
Forum 2003 speaker Marc Benioff and Shimon Peres. US President Bush is 
also said to be considering his invitation to speak. Join the AO online to receive 
your invitation. www.alwayson-network.com/summit.php fj 


Lack of a symbol is no indication of lack of merit. The full, current calendar is available on our Website, www.edventure.com. 
Please contact Christina Koukkos (christina@edventure.com) to let us know about other events we should include. 
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The conversation never stops! Subscribe to our free e-mail newsletter, The conversation 


continues, for thought-provoking analysis from our editors, along with commentary from our highly intel- 


ligent readers. Sign up at http://www.edventure.com/conversation/join.cfm. 
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